作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程

• •    

一种面向软件成分分析的轻量化依赖分析方法

  • 发布日期:2025-04-16

A Lightweight Dependency Analysis Method for Software Component Analysis

  • Published:2025-04-16

摘要: 如今,开源软件在各个行业,尤其是航空航天、汽车电子等关键领域广泛应用,但大部分开源软件存在安全漏洞。在我国,针对开源软件的成分分析和验证在软件开发及评测中严重缺失,导致关键领域软件安全难以保障。因此,软件成分分析对确保软件安全不可或缺,其中,准确的第三方依赖项(Third Party Dependencies, TPDs)识别是软件漏洞管理与合规性评估的关键。 针对上述问题,本文提出了一种面向软件成分分析的轻量化依赖分析方法,提高了TPD识别的准确性和大规模项目文件处理的效率,主要内容如下:第一,方法包含一种针对Java语言Maven项目的分析算法,算法通过识别项目的构建配置文件,构建项目结构模型并提取第三方依赖项信息;第二,方法包含一种基于Winnowing算法的冗余依赖检测算法,算法通过分步对比代码文件与识别到第三方依赖项信息的哈希指纹,检测第三方依赖项的实际使用情况,排除冗余依赖项;第三,基于提出的算法,设计并实现了一种轻量化的成分分析框架,该框架通过特定的分析器类包装分析算法,使用Java中的ServiceLoader API注册并执行分析任务。 为了验证方法的有效性,我们构建了一个包含56870个不同版本TPD的数据库,并从GitHub收集了4个真实的开源项目进行实验验证,结果表示,提出的算法在检测准确度上表现出色:与基于机器学习的聚类算法和基于代码相似性比较的技术相比,提出的算法有更高的准确率和F1分数和更低的检测耗时。另外,该系统应用的ServiceLoader API使得系统有较强的拓展性,便于增加不同的分析算法,有较强的实用性,为后续实现多语言TPD检测工作打下基础。 关键词:软件成分分析;依赖检测;Java语言;Maven工具;软件系统

Abstract: Nowadays, open source software is widely used in various industries, especially in key fields such as aerospace and automotive electronics, but most open source software has security vulnerabilities. In China, the component analysis and verification of open source software are seriously missing in software development and evaluation, which makes it difficult to ensure the safety of software in key areas. Therefore, software component analysis is indispensable to ensure software security, and accurate identification of Third Party Dependencies (TPDS) is the key to software vulnerability management and compliance assessment. To solve the above problems, this paper proposes a lightweight dependency analysis method for software component analysis, which improves the accuracy of TPD identification and the efficiency of large-scale project file processing. The main contents are as follows: first, the method includes an analysis algorithm for Java Maven project, which constructs the project structure model and extracts the third-party dependency information by identifying the construction configuration file of the project; Second, the method includes a redundant dependency detection algorithm based on winnowing algorithm. The algorithm detects the actual use of third-party dependencies and eliminates redundant dependencies by comparing the code file and the hash fingerprint that identifies the third-party dependency information step by step; Thirdly, based on the proposed algorithm, a lightweight component analysis framework is designed and implemented. The framework wraps the analysis algorithm through a specific analyzer class, registers and executes the analysis task using the ServiceLoader API in Java. In order to verify the effectiveness of the method, we built a database containing 56870 different versions of TPD, and collected four real open source projects from GitHub for experimental verification. The results show that the proposed algorithm performs well in detection accuracy: compared with the clustering algorithm based on machine learning and the technology based on code similarity comparison, the proposed algorithm has higher accuracy, F1 score and lower detection time. In addition, the ServiceLoader API applied in the system makes the system more extensible, convenient for adding different analysis algorithms, and has strong practicability, which lays the foundation for the subsequent implementation of multilingual TPD detection.